移动设备上的人类活动识别(HAR)已证明可以通过从用户的惯性测量单元(IMU)生成的数据中学到的轻量级神经模型来实现。基于Instanced HAR的大多数方法都使用卷积神经网络(CNN),长期记忆(LSTMS)或两者组合以实现实时性能来实现最新结果。最近,在语言处理域中,然后在视觉域中的变形金刚体系结构进一步推动了对古典体系结构的最先进。但是,这种变形金刚在计算资源中是重量级的,它不适合在Pervasive Computing域中找到HAR的嵌入式应用程序。在这项研究中,我们提出了人类活动识别变压器(HART),这是一种轻巧的,传感器的变压器结构,已专门适用于嵌入移动设备上的IMU的域。我们对HAR任务的实验具有几个公开可用的数据集,表明HART使用较少的每秒浮点操作(FLOPS)和参数,同时超过了当前的最新结果。此外,我们在各种体系结构中对它们在异质环境中的性能进行了评估,并表明我们的模型可以更好地推广到不同的感应设备或体内位置。
translated by 谷歌翻译
联合学习是一种新的机器学习范式,涉及独立设备上的分布式模型学习。联合学习的众多优点之一是,培训数据留在设备上(例如智能手机),并且仅与集中式服务器共享学习的模型。在监督学习的情况下,标签被委托给客户。但是,对于许多任务,例如人类活动识别,获取此类标签可能非常昂贵且容易出错。因此,大量数据仍然没有标记和未能探索。主要关注监督学习的大多数现有联合学习方法主要忽略了这些未标记的数据。此外,目前尚不清楚标准联合学习方法是否适合于自制学习。处理该问题的少数研究局限于同质数据集的有利状况。这项工作为在现实的环境中对联合学习的参考评估奠定了基础。我们表明,标准的轻型自动编码器和标准联合平均值无法通过几个现实的异质数据集学习对人类活动识别的强大表示形式。这些发现倡导在联合自我监督学习方面进行更深入的研究工作,以利用移动设备上存在的异质无标记数据的质量。
translated by 谷歌翻译
联合学习已被引入新的机器学习范式,以增强本地设备的使用。在服务器级别,FL定期聚集在分布式客户端上本地学习的模型,以获得更通用的模型。当前的解决方案依赖于客户端的大量存储数据的可用性,以微调服务器发送的模型。这种设置在移动普遍计算中不现实,在该计算中必须保持数据存储较低,并且数据特征可能会发生巨大变化。为了解释这种可变性,解决方案是使用客户定期收集的数据来逐步调整接收到的模型。但是这种天真的方法使客户面临着灾难性遗忘的众所周知的问题。为了解决这个问题,我们定义了一种联合的持续学习方法,该方法主要基于蒸馏。我们的方法允许更好地利用资源,从而消除了在新数据到达时从头开始重新审阅的需求,并通过限制存储的数据量来减少内存使用量。该提案已在人类活动识别(HAR)领域进行了评估,并已证明可以有效地降低灾难性的遗忘效果。
translated by 谷歌翻译
联合学习已被引入新的机器学习范式,以增强本地设备的使用。在服务器级别,FL定期聚集在分布式客户端上本地学习的模型,以获得更通用的模型。这样,没有通过网络发送私人数据,并且降低了通信成本。但是,当前的解决方案依赖于客户端的大量存储数据的可用性,以微调服务器发送的模型。这种设置在移动普遍计算中不现实,在该计算中必须保持数据存储较低,并且数据特征(分布)可能会发生巨大变化。为了解释这种可变性,解决方案是使用客户定期收集的数据来逐步调整接收到的模型。但是这种天真的方法使客户面临着灾难性遗忘的众所周知的问题。本文的目的是在智能手机的移动人类活动识别环境中证明这个问题。
translated by 谷歌翻译
Over the past decade, neural networks have been successful at making predictions from biological sequences, especially in the context of regulatory genomics. As in other fields of deep learning, tools have been devised to extract features such as sequence motifs that can explain the predictions made by a trained network. Here we intend to go beyond explainable machine learning and introduce SEISM, a selective inference procedure to test the association between these extracted features and the predicted phenotype. In particular, we discuss how training a one-layer convolutional network is formally equivalent to selecting motifs maximizing some association score. We adapt existing sampling-based selective inference procedures by quantizing this selection over an infinite set to a large but finite grid. Finally, we show that sampling under a specific choice of parameters is sufficient to characterize the composite null hypothesis typically used for selective inference-a result that goes well beyond our particular framework. We illustrate the behavior of our method in terms of calibration, power and speed and discuss its power/speed trade-off with a simpler data-split strategy. SEISM paves the way to an easier analysis of neural networks used in regulatory genomics, and to more powerful methods for genome wide association studies (GWAS).
translated by 谷歌翻译
In intensively managed forests in Europe, where forests are divided into stands of small size and may show heterogeneity within stands, a high spatial resolution (10 - 20 meters) is arguably needed to capture the differences in canopy height. In this work, we developed a deep learning model based on multi-stream remote sensing measurements to create a high-resolution canopy height map over the "Landes de Gascogne" forest in France, a large maritime pine plantation of 13,000 km$^2$ with flat terrain and intensive management. This area is characterized by even-aged and mono-specific stands, of a typical length of a few hundred meters, harvested every 35 to 50 years. Our deep learning U-Net model uses multi-band images from Sentinel-1 and Sentinel-2 with composite time averages as input to predict tree height derived from GEDI waveforms. The evaluation is performed with external validation data from forest inventory plots and a stereo 3D reconstruction model based on Skysat imagery available at specific locations. We trained seven different U-net models based on a combination of Sentinel-1 and Sentinel-2 bands to evaluate the importance of each instrument in the dominant height retrieval. The model outputs allow us to generate a 10 m resolution canopy height map of the whole "Landes de Gascogne" forest area for 2020 with a mean absolute error of 2.02 m on the Test dataset. The best predictions were obtained using all available satellite layers from Sentinel-1 and Sentinel-2 but using only one satellite source also provided good predictions. For all validation datasets in coniferous forests, our model showed better metrics than previous canopy height models available in the same region.
translated by 谷歌翻译
Knowledge Distillation (KD) is a commonly used technique for improving the generalization of compact Pre-trained Language Models (PLMs) on downstream tasks. However, such methods impose the additional burden of training a separate teacher model for every new dataset. Alternatively, one may directly work on the improvement of the optimization procedure of the compact model toward better generalization. Recent works observe that the flatness of the local minimum correlates well with better generalization. In this work, we adapt Stochastic Weight Averaging (SWA), a method encouraging convergence to a flatter minimum, to fine-tuning PLMs. We conduct extensive experiments on various NLP tasks (text classification, question answering, and generation) and different model architectures and demonstrate that our adaptation improves the generalization without extra computation cost. Moreover, we observe that this simple optimization technique is able to outperform the state-of-the-art KD methods for compact models.
translated by 谷歌翻译
This work addresses the problems of (a) designing utilization measurements of trained artificial intelligence (AI) models and (b) explaining how training data are encoded in AI models based on those measurements. The problems are motivated by the lack of explainability of AI models in security and safety critical applications, such as the use of AI models for classification of traffic signs in self-driving cars. We approach the problems by introducing theoretical underpinnings of AI model utilization measurement and understanding patterns in utilization-based class encodings of traffic signs at the level of computation graphs (AI models), subgraphs, and graph nodes. Conceptually, utilization is defined at each graph node (computation unit) of an AI model based on the number and distribution of unique outputs in the space of all possible outputs (tensor-states). In this work, utilization measurements are extracted from AI models, which include poisoned and clean AI models. In contrast to clean AI models, the poisoned AI models were trained with traffic sign images containing systematic, physically realizable, traffic sign modifications (i.e., triggers) to change a correct class label to another label in a presence of such a trigger. We analyze class encodings of such clean and poisoned AI models, and conclude with implications for trojan injection and detection.
translated by 谷歌翻译
White matter bundle segmentation is a cornerstone of modern tractography to study the brain's structural connectivity in domains such as neurological disorders, neurosurgery, and aging. In this study, we present FIESTA (FIber gEneration and bundle Segmentation in Tractography using Autoencoders), a reliable and robust, fully automated, and easily semi-automatically calibrated pipeline based on deep autoencoders that can dissect and fully populate WM bundles. Our framework allows the transition from one anatomical bundle definition to another with marginal calibrating time. This pipeline is built upon FINTA, CINTA, and GESTA methods that demonstrated how autoencoders can be used successfully for streamline filtering, bundling, and streamline generation in tractography. Our proposed method improves bundling coverage by recovering hard-to-track bundles with generative sampling through the latent space seeding of the subject bundle and the atlas bundle. A latent space of streamlines is learned using autoencoder-based modeling combined with contrastive learning. Using an atlas of bundles in standard space (MNI), our proposed method segments new tractograms using the autoencoder latent distance between each tractogram streamline and its closest neighbor bundle in the atlas of bundles. Intra-subject bundle reliability is improved by recovering hard-to-track streamlines, using the autoencoder to generate new streamlines that increase each bundle's spatial coverage while remaining anatomically meaningful. Results show that our method is more reliable than state-of-the-art automated virtual dissection methods such as RecoBundles, RecoBundlesX, TractSeg, White Matter Analysis and XTRACT. Overall, these results show that our framework improves the practicality and usability of current state-of-the-art bundling framework
translated by 谷歌翻译
There are many potential benefits to news readers accessing diverse sources. Modern news aggregators do the hard work of organizing the news, offering readers a plethora of source options, but choosing which source to read remains challenging. We propose a new framework to assist readers in identifying source differences and gaining an understanding of news coverage diversity. The framework is based on the generation of Discord Questions: questions with a diverse answer pool, explicitly illustrating source differences. To assemble a prototype of the framework, we focus on two components: (1) discord question generation, the task of generating questions answered differently by sources, for which we propose an automatic scoring method, and create a model that improves performance from current question generation (QG) methods by 5%, (2) answer consolidation, the task of grouping answers to a question that are semantically similar, for which we collect data and repurpose a method that achieves 81% balanced accuracy on our realistic test set. We illustrate the framework's feasibility through a prototype interface. Even though model performance at discord QG still lags human performance by more than 15%, generated questions are judged to be more interesting than factoid questions and can reveal differences in the level of detail, sentiment, and reasoning of sources in news coverage.
translated by 谷歌翻译